根据自我监督的方法,我们根据预先训练的深网络重新审视水印技术。我们提出了一种方法来将标记和二进制消息嵌入到其潜在空间中,利用在标记时间时使用数据增强。我们的方法可以在任何分辨率下运行,并在广泛的转换(旋转,作物,JPEG,对比度等)中创建水印稳健。它显着优于先前的零位方法,其对多比特水印的性能与最先进的编码器 - 解码器架构是对水印的端到端训练的端到端的平台。我们的实施和型号将公开可用。
translated by 谷歌翻译
在十亿缩放的数据集中快速检索类似载体的现代方法依赖于压缩域方法,例如二进制草图或产品量化。这些方法最小化了一定的损失,通常是针对检索问题量身定制的平均平方误差或其他目标函数。在本文中,我们重新解释了流行的方法,例如二进制散列或产品量化器作为自动编码器,并指出它们在解码器的形式上隐式制作次优假设。我们设计了向后兼容的解码器,可从相同的代码改进向量的重建,这转化为最近的邻居搜索中的更好性能。我们的方法显着提高了流行基准的二进制散列方法或产品量化。
translated by 谷歌翻译
潜在文本表示展示了几何规律,如着名的类比:女王是王的女人是男人。在图像表示上没有证明这种结构化语义关系。最近的作品,旨在将该语义差距缩短嵌入图像和文本到多峰空间,使传送文本定义的变换传输到图像模态。我们介绍SIMAT数据集以评估文本驱动图像变换的任务。 SIMAT包含6K图像和18K“转换查询”,其瞄准替换场景元素或更改其成对关系。目标是检索与(源图像,转换)查询一致的图像。我们使用匹配Oracle(OSCAR)的图像/文本来评估图像转换是否成功。 SIMAT DataSet将被公开可用。我们使用SIMAT来表明Vanilla Clip MultimoDal Embeddings不太适合文本驱动的图像转换,但Coco DataSet上的简单FineTuning可以带来戏剧性的改进。我们还研究利用普雷雷普雷普明的通用句子编码器(FastText,Lable和Labse)的几何特性是有益的。
translated by 谷歌翻译
We design a family of image classification architectures that optimize the trade-off between accuracy and efficiency in a high-speed regime. Our work exploits recent findings in attention-based architectures, which are competitive on highly parallel processing hardware. We revisit principles from the extensive literature on convolutional neural networks to apply them to transformers, in particular activation maps with decreasing resolutions. We also introduce the attention bias, a new way to integrate positional information in vision transformers.As a result, we propose LeVIT: a hybrid neural network for fast inference image classification. We consider different measures of efficiency on different hardware platforms, so as to best reflect a wide range of application scenarios. Our extensive experiments empirically validate our technical choices and show they are suitable to most architectures. Overall, LeViT significantly outperforms existing convnets and vision transformers with respect to the speed/accuracy tradeoff. For example, at 80% ImageNet top-1 accuracy, LeViT is 5 times faster than EfficientNet on CPU. We release the code at https: //github.com/facebookresearch/LeViT.
translated by 谷歌翻译
Recently, neural networks purely based on attention were shown to address image understanding tasks such as image classification. These highperforming vision transformers are pre-trained with hundreds of millions of images using a large infrastructure, thereby limiting their adoption.In this work, we produce competitive convolution-free transformers by training on Imagenet only. We train them on a single computer in less than 3 days. Our reference vision transformer (86M parameters) achieves top-1 accuracy of 83.1% (single-crop) on ImageNet with no external data.More importantly, we introduce a teacher-student strategy specific to transformers. It relies on a distillation token ensuring that the student learns from the teacher through attention. We show the interest of this token-based distillation, especially when using a convnet as a teacher. This leads us to report results competitive with convnets for both Imagenet (where we obtain up to 85.2% accuracy) and when transferring to other tasks. We share our code and models.
translated by 谷歌翻译
Clustering is a class of unsupervised learning methods that has been extensively applied and studied in computer vision. Little work has been done to adapt it to the end-to-end training of visual features on large scale datasets. In this work, we present DeepCluster, a clustering method that jointly learns the parameters of a neural network and the cluster assignments of the resulting features. DeepCluster iteratively groups the features with a standard clustering algorithm, kmeans, and uses the subsequent assignments as supervision to update the weights of the network. We apply DeepCluster to the unsupervised training of convolutional neural networks on large datasets like ImageNet and YFCC100M. The resulting model outperforms the current state of the art by a significant margin on all the standard benchmarks.
translated by 谷歌翻译
Similarity search finds application in specialized database systems handling complex data such as images or videos, which are typically represented by high-dimensional features and require specific indexing structures. This paper tackles the problem of better utilizing GPUs for this task. While GPUs excel at data-parallel tasks, prior approaches are bottlenecked by algorithms that expose less parallelism, such as k-min selection, or make poor use of the memory hierarchy.We propose a design for k-selection that operates at up to 55% of theoretical peak performance, enabling a nearest neighbor implementation that is 8.5× faster than prior GPU state of the art. We apply it in different similarity search scenarios, by proposing optimized design for brute-force, approximate and compressed-domain search based on product quantization. In all these setups, we outperform the state of the art by large margins. Our implementation enables the construction of a high accuracy k-NN graph on 95 million images from the Yfcc100M dataset in 35 minutes, and of a graph connecting 1 billion vectors in less than 12 hours on 4 Maxwell Titan X GPUs. We have open-sourced our approach 1 for the sake of comparison and reproducibility.
translated by 谷歌翻译
事件日志被广泛用于复杂系统中的异常检测和预测。现有的基于日志的异常检测方法通常包括四个主要步骤:日志收集,日志解析,特征提取和异常检测,其中特征提取步骤提取有用的功能,可通过计数日志事件来进行异常检测。对于一个复杂的系统,例如由大量子系统组成的光刻机器,其日志可能包含数千个不同的事件,从而导致富含提取的功能。但是,当在子系统级别进行异常检测时,分析所有功能变得昂贵且不必要。为了减轻此问题,我们为基于日志的异常检测和预测开发了一种功能选择方法,从而在很大程度上提高了有效性和效率。
translated by 谷歌翻译
由于其样本的复杂性很高,截至今天,模拟对于成功应用增强学习至关重要。然而,许多现实世界中的问题都表现出过度复杂的动力学,这使其全尺度模拟在计算上很慢。在本文中,我们展示了如何将许多代理的大型网络系统分解为多个局部组件,以便我们可以构建独立和并行运行的单独模拟器。为了监视不同局部组件彼此施加的影响,这些模拟器中的每个模拟器都配备了一个经过定期训练实际轨迹的模型。我们的经验结果表明,在不同的过程之间分配仿真不仅可以在短短几个小时内训练大型多机构系统,还可以帮助减轻同时学习的负面影响。
translated by 谷歌翻译
长期以来一直研究规则集学习,并且由于需要可解释的模型,最近经常被重新审视。尽管如此,现有方法仍有几个缺点:1)最新方法需要二进制特征矩阵作为输入,直接从数字变量中学习规则; 2)现有方法在规则之间施加命令,无论是明确或隐式而损害解释性的; 3)当前,对于多级目标变量学习概率规则集尚无方法(只有一种概率规则列表的方法)。我们提出了TUR,以解决真正无序的规则集,以解决这些缺点。我们首先将学习真正无序规则集的问题形式化。为了解决由重叠规则引起的冲突,即多个规则所涵盖的实例,我们提出了一种利用规则集的概率属性的新方法。接下来,我们开发了一种两阶段的启发式算法,该算法通过精心发展的规则来学习规则。一个重要的创新是,我们在学习地方规则时使用替代得分来考虑规则的全球潜力。最后,我们从经验上证明,与非稳定和(明确或隐式)有序的最新方法相比,我们的方法学习规则集,这些规则集不仅具有更好的解释性(即它们是较小且真正的无序),,但也更好的预测性能。
translated by 谷歌翻译